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Summary. This paper presents a dynamic linear model for modeling hourly ozone concentrations over 
the eastern United States. That model, which is developed within an Bayesian hierarchical framework, 
inherits the important feature of such models that its coefficients, treated as states of the process, can 
change with time. Thus the model includes a time-varying site invariant mean field as well as time 
varying coefficients for 24 and 12 diurnal cycle components. This cost of this model's great flexibility 
comes at the cost of computational complexity, forcing us to use an IVICMC approach and to restrict 
application of our model domain to a small number of monitoring sites. We critically assess this model 
and discover some of its weaknesses in this type of application. 
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1. Introduction 

This paper presents a model for the spatio-temporal field of hourly ozone concentrations for subre- 
gions of the eastern United States, one that can in principle be used for both spatial and temporal 
prediction. It goes on to critically assess that model and the approach used for its construction, 
with mixed results. 

Such models are needed for a variety of purposes described in Ozone (2005) where a comprehen- 
sive survey of the literature on such methods is given, along with their strengths and weaknesses. 
In particular, they can be used to help characterize population levels of exposures to ozone in 
outdoor environments, based on measurements taken at often remote ambient monitors. 

These interpolated concentrations can also be used as input to computer models that incor- 
porate indoor environments to more accurately predict population levels of exposure to an air 
pollutant. Such models can reduce the deleterious eff'ects of errors resulting from the use of ambi- 
ent monitoring measurements to represent exposure. For example, on hot summer days the ambient 
levels will overestimate exposure since people tend to stay in air conditioned indoor environments 
where exposures are lower. To address that problem, the US Environmental Protection Agency 
(EPA) developed APEX. It is being used by policy-makers to set air quality standards under 
hypothetical emission reduction scenarios (Ozone, 2005). Interpolated ozone fields could well be 
used as input to APEX to further reduce that measurement error although that application has 
not been made to date for ozone. However, it has been made for particulate air pollution through 
an exposure model called SHEDS (Burke et al., 2001) as well as a simplified version of SHEDS 
(Calder et al., 2003). 

Interest in predicting human exposure and hence in mapping ozone space-time fields, stems 
from concern about the adverse human health effects of ozone. Ozone (2005) reviews an exten- 
sive literature on that subject. Exposure chamber studies show that inhaling high concentrations 
of ozone compromises lung function quite dramatically in healthy individuals (and presumably 
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to an even greater degree in unhealthy individuals such as those suffering from asthma). More- 
over, epidemiology studies show strong associations between adverse health effects such exposures. 
Consequently, the US Clean Air Act mandates that National Ambient Air Quality Standards are 
necessary for ozone to protect human health and welfare. Thus, spatio-temporal models can have 
a role in setting these NAAQS. 

Ozone concentrations over a geographic region vary randomly over time, and therefore consti- 
tute a spatio-temporal field. In both rural and urban areas such fields are customarily monitored, 
the latter to ensure compliance with the NAAQS amongst other things. In fact, failure can result 
in substantial financial penalties. 

A number of approaches can be taken to modelling such space time fields. Here we investigate 
a promising one that involves selecting a member of a very large class of so-called state space 
models. Section [2] describes our choice, a dynamic linear model (DLM), a variation of those 
proposed by Huerta et al. (2004) and Stroud et al. (2001). Here "dynamic", refers to the DLM's 
capability of systematically modifying its parameters over time, a seemingly attractive feature 
since the processes it models will themselves generally evolve "due to the passage of time as a 
fundamental motive force" (West and Harrison, 1997). However, other approaches are possible 
and in a companion report currently in preparation, the DLM selected here will be compared with 
other possibilities. 

Section [2] introduces the hourly concentration field that is to be modeled in this report. There 
consideration of measurements made at fixed site monitors and reported in the AIRS dataset leads 
to the construction of our DLM. [The EPA (Environmental Protection Agency) changed the AIRS 
(Aerometric Information Retrieved System) to the AFS (Air Facility Subsystem) in 2001.] That 
model becomes the object of our assessment in subsequent sections. To illustrate how to select some 
of the model parameters in the DLM, we use the simple first-order polynomial DLM in Section 
[3] to shed some light on this problem. Moreover, we prove there in a simple but representative 
case, that under the type of model constructed here and by Huerta et al. (2004), the predictive 
variances for successive time points conditional on all the data must be monotonically increasing, an 
undesirable property. Theoretical results and algorithms on the DLM are represented in Sections |4] 
andO The MCMC sampling scheme is outlined in Section HTI The forward-filtering-backward- 
sampling (FFBS) method is demonstrated in Section 14.11 to estimate the state parameters in 
the DLM. Moreover, we outline the MCMC sampling scheme to obtain samples for other model 
parameters from their posterior conditional distributions with a Metropolis-Hasting step. Section 
[5] gives theoretical results for prediction and interpolation at unmonitored (ungauged) sites from 
their predictive posterior distributions. Section [S] shows the results of MCMC sampling along with 
interpolation results on the ozone study. Section [7] describes problems with the DLM process 
revealed by our assessment. We summarize our findings and draw conclusions from our assessment 
in Section m 

As an added note, we have developed software, written in C and R and available online 
( http: / / enviro.stat.ubc.ca[ ) that may be used to reproduce our findings or to use the model for 
modeling hourly pollution in other settings. 

2. Model development 

Although we believe the methods described in this paper apply quite generally to hourly pollution 
concentration space-time fields, it focuses on an hourly ozone concentrations (ppb) over part of 
the eastern United States during the summer of 1995. In all, 375 irregularly located sites (or 
"stations") monitor that field. To enable a focused assessment of the DLM approach and make 
computations feasible, we consider just one cluster of ten stations (Cluster 2), in close proximity to 
one another. However, in work not reported here for brevity, two other such clusters led to similar 
findings. Note that Cluster 2 has the same number of stations as the one in Mexico City studied 
by Huerta et al.(2004). 

The initial exploratory data analysis followed that of Huerta et al. (2004) with a similar result. 
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a square-root transformation of the data is needed to validate the normahty assumption for the 
DLM residuals. [Note that a small amount of randomly missing data were filled in by the spatial 
regression method (SRM), before we began.] The plot of a Bayesian periodogram (Dou et al., 
2007) for the transformed data at the sites in our cluster reveals a peak between 1 pm and 3 pm 
each day with a significant 24-hour cycle for the stations in Cluster 2. We also found a slightly 
significant 12-hour cycle. However, no obvious weekly cycles or nightly peaks were seen. Thus, 
the DLM suggested by our analysis turns out to be a variant of the one in Huerta et al. (2004) ; it 
has states for both local trends as well as periodicity across sites. 

To define the model, let yu denote the square-root of the observable ozone concentration, at site 
Si, z = 1, . . . , n, and time t, t — 1, . . . ,T, n being the total number of gauged (that is, monitoring) 
sites in the geographical subregion of interest and T, the total number of time points. Furthermore, 



let yt — (yit, ■ ■ ■ ,ynt)' ■ n x 1- Then the DLM for the field is 

yt = '^'nPt + Sit[ai)a-i_t + S2t{a2)a.2t + vt (1) 

Pt = Pt-i+wt (2) 

Ckjt = Q;j,t-l+wt"^ (3) 



where vt ~ iV[0, a^Vx], wt ~ N[0, aV^], wt"^ - N[0, a^rfVx^], Vx = exp(-y/A), "Kx, = exp(-y/A,), 
j = 1,2, and ajt = {ajit, . ■ . , ajnt)' ■ nx l,j — 1,2. Here (3t denotes a canonical spatial trend 
and a jit, a seasonal coefficient for site i at time t corresponding to a periodic component, Sjt{aj), 
where Sjt{aj) = cos^ntj /12) + aj sin{ntj /12), j = 1, 2. Note that V = (uy ) : n x n represents the 
distance matrix for the gauged sites Si, . . . ,Sn, that is, Vij = \\si — Sj|| for i,j — 1, . . . ,n, where 
||si — Sjll denotes the Euclidean distance (km) between sites S; and Sj. 

Models (H])-© can also written in the form of a state space model with the observation and 



state equations 

yt = F;xt+i^t (4) 
xt = xt-i+wt, (5) 

where = if3t, an' , a2t'), t^t' = (wt, t^t"'', ^t"'')'> and F[ is given by 

" 1 Suiai) ... 52t(a2) ... " 
1 Su{ai) ... 52t(a2) ... 

1 ... ^it(ai) ... S2tia2) _ 



Note that uit ^ N[0, cr'^W], W being the block diagonal matrix with diagonal entries r^, exp(— V^/Ai), 
and tI exp{—V/ X2) ■ 

Let yi;T = (2/"t' ^i t)'' where y"}j. = (y™, . . . ,2/™) represents all the missing values and 
all the observed values in Cluster 2 sites for i = 1, . . . , T. The model unknowns are therefore the 
coordinates of the vector (A, cr^, a;i:T, 2/™^, ai, 02), iu which the vector of state parameters up to 
time T is xi-t — (xi , . . . , xx), the range parameter is A, the variance parameter is cr^ and finally the 
vector of phase parameters is a = (ai, 02). Let 7 = (t^, r^, Ai, r|, A2) be the vector of parameters 
fixed in the DLM to render computation feasible. 

Specification of the DLM is completed by prescribing the hypcrpriors for the distributions of 
some of the model parameters: 

A - IG{ax,f3x) 
^ IG{a^2,(3^2) 

a ^ N{^l:,K)■ 

Notice that A and have inverse Gamma distributions for computational convenience^ The 
choice of the hyperpriors is discussed in Section [6l 

tX - IG{a,l3) if F = 1/X ~ G{a,l3), where p{y) oc y"-^ exp{-l3y) for a,/3 > 0. 



We express the state-space model in two different ways because of our dual objectives of 
parameter inference and interpolation. For simplicity, we use models (l4])-(l5]) for inference about 
the range, variance and state parameters (see Section [4A|) . and use models ([T])-© for inference on 
the phase parameters (see Section |4J|) and interpolation (see Section [5|). 



3. Parameter specification 

Before turning to the implementation of the approach in the next section, we explore theoretically, 
albeit in a tractable special case, some features of the model. That exploration leads to insight 
about how the model's parameters should be specified as well as undesirable consequences of 
inappropriate choices. Our assessment will focus on the accuracy of the model's predictions. 

This simple model we consider is a special case of the so-called "first-order polynomial model" , 
a mathematically tractable, commonly used model. It captures many important features and 
properties of the DLM we have adopted. 

For i — 0,1, . . . ,n and i = 1, . . . , T, the first-order polynomial DLM is given by 

Vit = Pt+ £it (6) 
A = (3t-i+St, (7) 

where £t = (eot, ■ • ■ N{0, a"^ cxp{—V/ X)), and 6t ~ N{0,aj). Assume (3o ~ N{0,a'^) and 

X,a^,ag and are all currently known. 

The first-order polynomial DLM is particularly useful for short-term prediction since then the 
underlying evolution (3t is roughly constant. Observe that the zero-mean evolution error St process 
is independent over time, so that the underlying process is merely a random walk; the model does 
not anticipate long-term variation. At any fixed time t : 

t 

Pt = /3o + 5I'5fc (8) 

k=l 
t 

Vtt = /3o + ^(5fc+e,fc. (9) 
fc=i 

Consequently, the first-order polynomial DLM has the following covariance structure: 

Var(2/rt) = al+ta^s+a^^ (10) 
Cov(2/,t,yjt) = aj+ta^s +a^,cxp{-d,j/X) {i ^ j) (11) 
Cov{yit,yjs) = ap +mm{t,s}ag [s^t], (12) 

where dij — \ \&\ — Sj| |, for i,j = 0, 1, . . . , n and t, s = 1, . . . ,T. 

This DLM defines a non-stationary spatio-temporal process since for the first-order polynomial 
model to be stationary, the eigenvalues of state transfer matrix, G = Gt in the notation of West 
and Harrison (1997), must lie inside of the unit circle. However, G* = 1 so that this process is not a 
stationary Gaussian DLM. Furthermore, the DLM defined in Section [5] is non-stationary because 
Gt = l2n+i given all the model parameters in ([I])-®. The DLM in ©-([T]) has an important 
property that the covariance functions in pip - (jl2p depends on the time point of min{t, s}, not on 
|t — s| thus confirming our observation of non-stationary. 

We readily find the correlation between yn and yjs to be 

„ , . ag+to^^+agexp(-d,,/A) 

GoT:[yu,yjt) = 2 , . 2 , — 2 T 3) (13) 

afj + minjt, s}ag 

CoT{yu,y,s) = " I , i^^i) (14) 
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where i, j = 0, . . . ,n and s,t = 1, . . . ,T. 
Remarks. 

1. The correlations in (fTH]) and have the foUowing properties when i ^ j: 

(i) 

Coi{yit,yjt) > Gor{yu,yjs) (15) 

for s ^ s,t — 1, . . . ,T and 

(ii) 

Coi{yit,yjt) - CoT{yit,yjs) (16) 
is a monotone increasing function of |t — s|. 

Thus for any fixed time point t, Cor(j/ij, j/j^) as a function of s attains its maximum at s = t 
and decreases as \s — t\ increases. 

2. By (fTS]). CoT{yit,yjt) — > 1 as t ^ oo for i ^ j, i,j G {0,...,7i}. That property seems 
unreasonable; the degree of association between two fixed monitors should not increase as an 
artifact of a larger time t. That suggests a need to make some of the model parameters, say 
ag, depend on time. More specifically, suggests making taj — 0{1) stabihze CoT{yit,yjt)- 
Carrying this assessment further, for any two sites in close proximity, i.e. for dij ~ 0, 

al + taj + al 
Cor(y.„y,,)^ ^2 ^,^2^^, -1, 



a result that seems quite reasonable. For two sites very far apart so that — > oo, 
Cor(yri,yjt) 



a} + taj al + 0(1) 



_ '13 

(jI + taj + cr| al + 0(1) + cr| 



This correlation should be close to 0. In other words, we should have cr^ + 0(1) ^ cr^. A 
sufficient condition for this to hold is cr? <C cr^ and ta"^ = 0{1) <C cr^. 



The key result, (|13[) . suggests a simple but straightforward way to adjust the model parameter a 



2 



according to the size of T, namely, to replace it by (tI/T. That choice is empirically validated in 



Section El 

We turn now to study the behavior of the predictive variances in the first-order polynomial 
DLM that helps us understand the interpolation results. To that end consider the correlations of 
responses at an ungauged site Sq with those at the gauged site sj, j g {1, . . . , n}, respectively. Note 
that both (fT5)) and (fT6)) hold for z = 0. The properties of the correlation structure in p^ - p6)) . 
lead us to the conjecture that the model's predictive bands should increase monotonically over 
time as more data become available, in the absence of restrictions on tal = 0(1) suggested above. 
Furthermore, even conditioning on all the data, the predictive bands should also increase over time. 
In support of these conjectures, we prove that they hold in a simple case where n — 1 and T = 2 
in ©-(El). 

Theorem 1. For the first-order polynomial DLM in (0)-(Q) with n = 1 and T = 2, assume 
the prior for (3o to he N{0, cr|). The joint distribution of y — {yoi, yn, yo2j 2/12)' is -^(0, S), where 

S = (cr? + cr|)l4'l4 + block~diagonal{(jl exp(-y/A), cr|l2'l2 + exp(-y/A)}, 



Ijj. being the k x 1 vector of Is (k — 1,2, . . .). Then we have the following predietive conditional 
variances: 

Var{yoi\yii) = „2 , „2 , „2 5 (l'^) 



[al + 2al + alf (ag + 2ag + a? exp(-rfoi/A))2 



Var(y,,\v,,) ^ ^"^'^"^ ' ^^^^ , ^ ^^^"^ ^"^^"^^ ; (18) 



where 



Ml 

Var{yQi\yii,yi2) = (19) 
M2 

^ar (yo2 1 2/1 i,yi2) = (20) 



A = {al + <j} + al){a} + 2al + <jl)-{al + al)\ (21) 

Ml = (a2+2a| + fT2){(a2+a| + fT2)2-(a2+a2 + a2exp(-doi/A))n 

^2(^2 ^ ^2)2(^2 _ ^2 exp(-doi/A)), (22) 



and 



M2 = (a2+cT| + a2){(a2+2a| + a2)2-(a2^2a2^a2cxp(-rfoi/A))'} 

-2(a2 + ^2)2(^2 _ ^2 cxp(-doi/A)). (23) 

For this simple case, we would expect the predictive variance of ?/oi based on more data collected 
over time to be no greater than that of j/oi based on less, that is, 

^ar(yoi|2;ii) > Var{yoi\yn,yi2) 

and 

Var{yo2\yi2) > Var{yo2\yii,yi2). 

Moreover, we would expect that, based on the same amount of data, the predictive variance of yoi 
would be no greater than that of yo2, that is, 

V ar {yoi\y II, y 12) < Var{yo2\yii,yi2). 

Dou et al. (2007) prove these conjectures and provide other comparisons of these predictive 
variances. We conclude that the predictive variance function is a monotonic increasing function of 
time t based on the same set of data. It decreases when more data or equivalently, more time is 
involved. Furthermore, the difference between these predictive variances decreases as t increases. 
It increases with time even when conditioning on the same dataset. 

Theorem 2. For the first-order polynomial DLM in Theorem]^ we have the following prop- 
erties of the predictive conditional variances: 

<^ei<^P + ^5)'(1 - exp(-doi/A))2 
Var(yoi\yii) - Var(ym\yii,yi2) = 2 , — — 2^ ^ ^5 (24) 

a^{aj + a^)2(l - exp(-doi/A)2) 
Var(yQ2\yi2) ~ Var(yQ2\yii,yi2) = ..2,02, — T\ - (^5) 

v / I 1 T/ / I ) ^eO"l(l-exp(-rfoi/A))2 

Var(yo2\yii,yi2j - Var(yoi\yii,yi2j = r > 0; (26) 
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Var{yoi\yii) - Var(j/oi 2/12) > Var{yo2\yi2) - Var{yQ2\yii,yi2); 
Var{yQ2\yi2) -Var{yQi\yii) < Far(?/o2|yii, ^12) - Far (yoilyii, ^12)- 




(27) 



(28) 
(29) 



As an immediate consequence of (|26p . the predictive variances increase monotonically at suc- 
cessive time points conditional on all the data. That leads to monotonically increasing coverage 
probabilities at the ungauged sites, an interesting phenomenon discussed in Section [T] There we 
will also discuss the lessons learned in this section in relation to our empirical findings. 

Next, we present a curious result about the properties of the above predictive variances that 
may explain some of their key features. This result concerns these predictive variances as functions 
of A, doi or a^. Part of its proof is included in Appendix lA.il 

Corollary 1. The predictive conditional variances in |J7[ )-(l ^j increase as doi increases, or 
a1 increases, or A decreases. 

Thus, keeping two parameters fixed, these predictive conditional variances are monotone functions 
of the remaining one. Therefore, the DLM can paradoxically lead to larger predictive variances 
when conditioning on more data. For example, in the case n = 1 and T = 2, applying the DLM 
model with only the data at T = 2 yields the predictive variance Var* [yQ2\y 12) ^ which is exactly 
the same as Var{yai\yii) in (jl7[) . This predictive variance is smaller than V^ar(j/o2|yii, 2/12) in (|20p . 
which is based on more data, under certain condition specified in the next corollary. 

Corollary 2. For the first-order polynomial DLM in Theorem 1, 



The behavior suggested by Corollary [5] is actually observed in our application (see Section [7|) . 
4. Implementation 

This section very briefly describes how to implement our model using the MCMC method, more 
specifically, the forward-filtering-backward-sampling algorithm of Carter and Kohn (1994). The 
details are given by Dou et al. (2007). 

4.1. Metropolis-within-Gibbs algorithm 

The joint distribution, p(A, (7^, a;i:T, fli, a2|?/°.7^), is the object of interest. Here?/°T = (yii---7yT) 
represents the observation matrix at the n gauged sites up to time T. Moreover, xi;t = {xi , . . . ,xt) ■ 
(2n-|- 1) X T is the vector of state parameters at the n gauged sites until time T. For simplicity, the 
values of 7 are fixed but the problem of setting them will be addressed below. Additional detail 
can be found in Appendix I A. 2 1 

Since that joint distribution does not have a closed form, direct sampling methods fail, leading to 
the use of the Markov Chain Monte Carlo (MCMC) method. A blocking MCMC scheme increases 
iterative sampling efficiency, three blocks being chosen for reasons given in Dou et al. (2007): 
(A, CT^, a;i:T), J/iV ^'^d (01,02). More precisely we can: 

(i) sample bom p{xi-T, X,a^\ai,a2,yi:T) 

(ii) sample from p(?/™y|A, cr^, a;i:T, fli, a2, 2/i t) ^^'^ 
(ii) sample from p{ai, a2\xi;T, A, a'^,yi-T). 




(30) 



Since p(A, a^, xi:T|ai, 02, j/i:t) has no closed form, the full conditional posterior distribution 
of xi;T is obtained by Kalman filtering and smoothing, in other words, by the FFBS algorithm. 
Assuming an inverse Gamma hyperprior for cr^, the conditional posterior distribution of given 
the range and phase parameters is also inverse Gamma distributed with new shape and scale 
parameters. Note that 

p{X,a^,xi;T\ai,a2,yi:T) = p(A|ai, a2, ?;i:t)p(o'^|A, ai, a2, ?;i:t) 

Xp(x1:t|A, (T^,ai,a2,?/l:T), (31) 

indicating that we can sample iteratively from the three conditional posterior distributions on the 
right-hand-side of ([3T|) to obtain samples from p{X, , xi;T\ai, 02, Ui-.t)- However, p{X\ai, 02, yi-.r) 
has no closed form, leading us to sample A by a Metropolis-Hasting chain within a Gibbs sampling 
cycle, an algorithm as described in the next three subsections. 



Sampling from p(A, ct^, a;i:Tlai, 02, 2/i:t) 

To sample {X,a^ ,Xi;t) from }3(A, ct^, Xi:T|ai, 02, ^iit), we use the block MCMC scheme. Because 
of (PT|) . we could ideally iteratively sample A from p(A|ai, 02, ?/i:t), from p{a^\X,ai, 02, Ui-.t) 
and xi;T from p(xi:t|A, ct^, oi, 02, 2/i:t)- However, because we do not have a closed form for the 
posterior density of p(A|ai, a2, Ui-.t), we use instead the Metropolis-Hasting algorithm to sample A, 
given the data, from the following a quantity that is proportional to its posterior density, that is. 



p{X\ai,a2,Ui-.T) oc p(A) J]^ IQt 



/3- 



-, -(TiT/2+a) 



(32) 



Since we cannot compute the normalization constant for p(A|ai, 02, ?/i:t), the Metropolis-Hasting 
algorithm is used. The proposal density, q{., .), is selected to be a lognormal distribution, because 
the parameter space is bounded below by 0, making the Gaussian distribution inappropriate. As 
MoUer (2003) points out, this alternative to a random walk Metropolis considers the proposal 
move to be a random multiple of the current state. From the current state A^^^^^(j' > 1), the 
proposed move is A* = X'-^-^^e^, where Z is drawn from a symmetric density, such as normal. 
In other words, at iteration (j), we sample a new A* from this proposal distribution, centered at 
the previously sampled A*-^"^-* with a tuning parameter, r^, as the variance for the distribution of 
Z. Gamerman (2006) suggests the acceptance rate, that is, the ratio of accepted A* to the total 
number of iterations, be around 50%. We tune to attain that rate. If the acceptance rate were 
too high, for example, 70% to 100%, we would increase r^. If too low, for example, to 20%, we 
would decrease , to narrow down the search area for A* . 

The Metropolis-Hasting algorithm proceeds as follows. Given A'^-'"^-', 
where j > 1 : 

• Draw A* from LN{X'^i-^\t^). 

• Compute the acceptance probability: 



,0-1) 



and Ui-.T^\ 



k(A(^-i\A*) =mini 1, 



p{X*\ar'\ar'\y'{'-.T ')M^'^'\y) 
p(A0-i)|4^-^\ar'\y|-?T'^)/'Z(A*,A0-i)). 



• Accept A* with probability a{X^^ , A*). In other words, sample u ^ U[0, 1] and let A'-') = A* 
if A* < u and A^-'^ = A*^^^^ otherwise. 



We run this algorithm iteratively until convergence is reached. 

Next, we sample given the accepted A's, ai's, 02 's and Ui-T- The prior for is chosen to 
be an inverse gamma distribution with shape parameter a and scale parameter /3. The posterior 
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distribution for cr^ is also an inverse gamma distribution, but with a shape parameter a + ^ and 

a scale parameter /3 + | X^^i ^t'Qt^^t- 

We now sample xi:t given the accepted A's, cr^'s, phase parameters and yi-.r, using FFBS. West 
and Harrison (1997) propose a general theorem for inference about the parameters in the DLM 
framework. For time series data, the usual method for updating and predicting is the Kalman 
filter. Dou et al. (2007) present a FFBS algorithm (similar to the Kalman filter algorithm) to 
resample the state parameters conditional on all the other parameters and observations as part of 
the MCMC method for sampling xi;t = (xi, . . . ,xt) from the smoothing distribution 

p(xt|A, (T2,ai,a2,?;i:T)- 

The initial state parameter is given by 

(xo|yo,e) - A^[mo,<T2Co], (33) 

where yo being the initial information, with mo and Cq known. Later in Section [SI we consider 
how to set them for Cluster 2 AIRS dataset (1995). Let 9 = (A, cr^, ai, a2, 7). Now suppose for 
expository purposes, that all the prior information has been given and 0's coordinates are mutually 
independent. 



Sampling from piy^^lX, c^^ xi^t, ai, 03, ylj,) 

MCMC can be used to fill in missing values at each iteration. To see how, note that at any fixed 
time point t, after appropriately defining a scale matrix i?f, we can rewrite the observation vector 
yt as follows: 

' 2/r 



RtYt 



Vt 



where : x 1 denotes the missing response(s) at time t and y° : {n — rit) x 1 the observed 
response(s) at t. Notice that "o" represents "observed" and "m" , "missing" . 

Let Rt = (e„i , . . . , e„j , efc^ , . . . , efc„_„^ )', where {s„^. : j = 1,. . . ,t} represents the set of gauged 
sites containing missing values at time point t, {s^^. : j = 1, . . . ,n — rit} the set of gauged sites 
containing observed values at time t, for alH = 1, . . . , T; and ej — {eji, . . . , eju)' n x 1 such that 
Cjk = Ij=k for fc = 1, . . . , j and j G Z+. 

We already know that 

(yt|A,a2,Xt,a) ^ 7V[F;xt, exp{-F/A}], 
so that RtYt is also multivariate normally distributed, that is, 

(i?tyt|A,a2,xt,a) - ((y[", yf )'|A, a^, xt, a) ^ ^[/It,St], 

where 

Jit = RtFty^t 

= cr2i?texp{-F/A}i?;. 

We can also partition /It as jit — (/i™', ')'; where pi™ : rtj x 1 and Jl° : {n — rit) x 1- Similarly, 
we have 



where EJ""^ : nt x rit, : rit x {n — ^t) and : [n — rit) x (n — rit). 
By a standard property of the muhivariate normal distribution, we have 

(yr|A,a2,Xt,a,y°) ~ N[^xl\l:T], (34) 



where 

M =tJ-t i^t ) (yt-Mt)> (35) 

and 

E** = E™" - (36) 

fort = l,...,r. 

At each iteration, we draw y[" from the corresponding distribution (j34p at each time point t 
and then we can write the response variables as yi:T — {y^T^ Vi-t) where y™-^ — (y™, ■ • ■ , Vt) s-i^d 

Sampling from p(ai, 02] A, ct^, xitT, yi-.r) 

We now present our method for sampling the phase parameters a = (ai, 02)' from its full conditional 
posterior distribution, that is, p(a| A, cr^, a:i:T, 2/i:t), by using the samples of A, and xi-t- For 
simplicity, we use the notation for models ([T])-(l3|) in this section. 

We then sample the constant phase parameters conditional on all the other parameters and 
observations. Suppose a = (01,02)' has a conjugate bivariate normal prior with mean vector 
^° = (/xio,/Z2o)' and covariance matrix YP . Then the posterior conditional distribution for a is 
normal with mean vector /^* and covariance matrix S*, where /i* and S* can be obtained from 
equations given in Dou et al. (2007). 

We will not use a non-informative prior such as p(a) oc 1 for a since that choice can lead to 
non-identified posterior means or posterior variances. In fact for that choice we find the posterior 
conditional distribution of a to be normal with mean vector /i — (/ii, ^2)' and covariance matrix S 
from equations given in Dou et al. (2007) along with the elements of S, where E can be singular 
for any t = 12k {k G Z). Hence, we obtain the extreme values at times 12, 24, . . . , 2880, that 
invalidates the assumption of constant phase parameters across all the time scales when we sample 
from its full conditional posterior distribution. 

For fixed values of A, and xi;t, we sample the model parameter a — (oi, 02) from N{^* , E*) at 
each time point, and then obtain the "sample" of a at this iteration by the median of these samples 
across all the time points, under the assumption that (oi, 02) are constant phase parameters in the 
models Q-®. 

4.2. Summary 

The MCMC algorithm we use here resembles that of Huerta et al. (2004), one difference being 
that we unlike them, use all the samples after the burn-in period, not just the chain containing 
the accepted samples. We believe the Markov chains of only accepted results will lead to biased 
samples, thereby changing the detailed balance equation of the Metropolis-Hasting algorithm. 
The above algorithm we use for Cluster 2 AIRS dataset is summarized as follows: 

1. Initialization: sample 

x[% - iV(mo,a2^'^Co). 

2. Given the (j - 1)* value X^^-^\ a^^^^^\ y™T^^"^^ 4^"^^ and the observations 

(1) Sample (A^ , tr^^^^ x^^^) from p(A, a^, xi:T|a?"'\ 4'~'\ yi-r'^), where 

_ („.m (i-1) o N 
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(i) • Generate a candidate value A* from a logarithm proposal distribution (/(A^^"^^ , A), 

that is, LN{X'-J-^\t'^) for some suitable tuning parameter r . 

• Compute the acceptance ratio a{\^^^^\ A*) where 

a(A(^-i) , A*) = min 1, n • 

• With probability a(A^''^^-' , A*) accept the candidate value and set A'^-'^ = A*; 
otherwise reject and set A'^-'^ = A'^-'^^^. 

(ii) Sample a2(^'^ bom p{<j^\X(j\a['-^\4'-'\y^^~^^). 

(iii) Sample x^}^ from p{xi.,T\X(^\a^'-'\a^r^\a^i'^\y['-^^). 

(2) Sample y™^,^^') iiom p{y';}r^\X'^^\a'^'^^\x['.}r,a[^^^\a^^'~^\y°.j,). 

(3) Sample (aj^'\4^'^) from p{ai,a2\X'^^\a^''^\x[%,y^^J^), where yjf^ = (y^T^^^ y?:T)- 

3. Repeat until convergence. 

We have developed software to implement the DLM approach of this section. To enhance the 
Metropolis-within-Gibbs algorithm, we augment the R code with C to speed up the computa- 
tion. The current version, GDLM.1.0, is freely available at jhttp:/ /enviro.stat.ubc.ca for different 
platforms such as Windows, Unix and Linux. 



5. Interpolation and prediction 

This section describes how to interpolate hourly ozone concentrations at ungauged sites using the 
DLM and the simulated Markov chains for the model parameters (see Section SJ. In other words, 
suppose si , . . . , Su are u ungauged sites of interest within the geographical region of Cluster 2 sites 
(excluding the possibility of extrapolation). The objective is to draw samples from 

Piyl.rl^^ <^^,xi:T, fli, 02, yi-.r), 

where yf.rp = (y^, . . . , y^) : 1 x T and yf denotes the unobserved square-root of ozone concentra- 
tions at the ungauged site s and time t, for t = 1, . . . ,T and for s G {si, . . . ,Su}. Let (aff,a2j) 
denote the unobserved state parameters at site s and time t. The DLM is given by 

y^now ^ l„+i'/3t -I- S'it(ai)ait"™ + 52t(a2)Q!2t"™' + ft""", (37) 

where yt""" = (2/|,yt')'> "t"™ = (aft, ait', a2t, a2t')': and i/t""" iV(0, cr^ exp(-y"™/A)). 

In the following two subsections, we illustrate how to sample the unobserved state param- 
eters {(afj,a|j) : t — 1,...,T} from the corresponding conditional posterior distribution, and 
demonstrate the spatial interpolation at the ungauged site s. 



Sampling the unobserved state parameters 

We first sample given a|j_]^, ajt and ckj.t-i,j — 1,2. From the state equation ([5]) for ajt"™, 
we know that the joint density of ajj and ajt follows a normal distribution, with covariance matrix 
ct^tJ exp (— y°™/Aj), where T/"™ denotes the distance matrix for the unobserved station and the 
monitoring stations. The conditional posterior distribution, 

P(ajt |aj,t-i, ^, o-^, /3t, ait, a2t, fli, a2, 2/1:t), 



is derived in Appendix I A. 3 1 




CD 
O 



Longitude 



Fig. 1. Schematic representation of the locations of ten gauged sites in Cluster 2 and the randomly chosen 
six ungauged sites. (Number = Cluster 2 sites and letter = ungauged sites.) 



Spatial interpolation at ungauged sites 

We interpolate the square-root of ozone concentration at the ungauged sites by conditioning on 
all the other parameters and observations at the gauged sites. As above, and yt are jointly 
normally distributed as a consequence of the observation equation. The predictive conditional 
distribution for y^, that is, p(y||af j cr^j ctit, Q;2tj ai, 02, 2/1:t), is given in Appendix I A. 31 



6. Application 

This section applies our model to the hourly ozone concentration field described above. Six un- 
gauged sites were randomly selected from those available within the range of the sites in Cluster 
2 to play the role of "unmonitored sites" and help us assess the performance of the DLM. The 
geographical locations of these six ungauged sites, represented by the alphabetic letters. A, . . . ,F, 
are shown in Figure [1] along with the sites in Cluster 2. 



6. 1 . MCMC sampling 

This subsection presents a MCMC simulation study in which samples are drawn sequentially from 
the joint posterior distribution of the model parameters in the DLM. 

Initial settings 

Following Huerta et al. (2004), we use the following initial settings for the starting values, hyper- 
priors and fixed model parameters in the DLM: 

• The hyperprior for A is /G(l,5) and for a^, /G(2, 0.01). The expected value of IG{1,5) is 
00 and so are both of the variances of p{X) and p{cr^)- These vague priors for A and are 
selected to reflect our lack of prior knowledge about their distributions. 

• The initial information for xq, the initial state parameter, is assumed to be normally dis- 
tributed with mean vector mo = (2.85, —0.751^, —O.OSl^J' and covariance matrix afCa, 
where ^ /G'(2,0.01) and Cq is a block diagonal matrix with diagonal entries 1, 0.011^ 
and 0.011' 
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Fig. 2. Traces of model parameters with the number of iterations of the IVIarkov chains. The parameters are: 
(a)-A, the range parameter; (b) -a^, the variance parameter; (c) -ai, the phase parameter with respect to 
the 24-hour periodicity; and (d) -02, the phase parameter with respect to the 12-hour periodicity. 



Table 1. Posterior summaries for A, a^, ai 
and a2. 



Quant ilc 


A 




ai 


a2 


2.5% 


69.29 


1.19 


2.42 


9.77 


Median 


71.83 


1.21 


2.45 


9.80 


97.5% 


75.37 


1.24 


2.48 


9.84 



• The hyperprior for a is a bivariate normal distribution with mean vector fi° — (2.5, 9.8)' and 
a diagonal matrix S° with diagonal entries 0.5 and 0.5. 

• Some of the model parameters in the DLM are fixed as follows: — 0.02, = 0.0002, 
r| = 0.0004, Ai = 25 and A2 = 25. 



Monitoring the convergence of the Markov chains 

Figure[2]shows the trace plots of model parameters A, cr^, ai and 02 with the number of iterations 
of the simulated Markov chains where the total number of iterations is 4, 268. The burn-in period 
is chosen to be 2, 269 and all the remaining Markov samples are collected for posterior inference. 
The acceptance rate is approximately 62%. We observe that the Markov Chain converges after a 
run of less than five hundreds iterations. 

Table [1] displays the median and 95% quantilc from the simulated Markov chains for the model 
parameters A, tr^, ai and 02- 



6.2. Spatial interpolation 

This subsection assesses the model's performance by comparing the interpolated values at the un- 
gauged sites. A, . . . ,F, with the measurements made there. We use the entire dataset to assess the 
performance of the interpolation results. Table [5] shows the coverage probabilities of the credibility 
intervals (or "credible intervals" for short) for these six ungauged sites at various norminal levels. 



Table 2. Comparisons between the empirical credible probability and 
the nominal levels at the ungauged sites A,...,F. 



Nominal Prob (%) 


A 


B 


Coveraj 
C 


re Prob.s (%) 
D E 


F 


95 


94.9 


96.9 


96.5 


99.7 


96.1 


98.1 


90 


91.9 


93.7 


93.5 


99.4 


93.6 


96.8 


80 


84.8 


88.5 


88.2 


97.7 


89.6 


94.3 


70 


78.7 


83.5 


83.3 


94.0 


85.8 


90.6 


60 


73.0 


78.5 


77.1 


89.7 


81.6 


86.6 


50 


65.2 


71.5 


70.4 


85.6 


76.1 


81.4 


40 


55.2 


61.4 


61.0 


79.2 


67.9 


74.7 


30 


42.2 


47.6 


47.5 


69.6 


54.9 


64.4 





Fig. 3. Interpolation at Ungauged Site D for four successive weeks beginning from IVlay 14, 1995. The 
square-root of hourly ozone concentrations are plotted on the vertical axes, hours on the horizontal axes. 
The solid lines represent the predicted median of the responses, the dashed lines represent the 95% predic- 
tive intervals for the predicted square-root of ozone concentrations and the solid dots represent the obser- 
vations at Ungauged Site D. 



Generally, the coverage probabilities at the ungauged sites exceed their nominal levels indicating 
that the error bands are too wide. 

Among these six ungauged sites, Site D has the highest coverage probability seen in Tabled] 
This may be because of D's nearness to a close "relative" among the gauged sites, namely. Site 1. 
That would be consistent with our assumption that the spatial correlation is inversely proportional 
to the intersite distance. At the same time, these unsatisfactory large coverage probabilities point 
to a deficiency of the DLM. 

To explore this issue further, we compared the values predicted for Ungauged Site D from May 
14 to September 11, 1995 and the measurements made there. Figures[3]and in more detaillH which 
exemplify results reported in more detail by Dou et al. (2007), depict the results for the first four 
weeks and the last week of that period, respectively. 

Furthermore, Table [3] shows for all the ungauged sites, the close relatives they have among the 
Cluster 2 sites that lie within a radius of 100 km, the corresponding global circle distance (GCD) in 
km, and along with the average of their correlations. This table confirms that indeed D does enjoy 
the highest correlation with its relative. That relationship is further explored in Figure [5] where 
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Fig. 4. Interpolation at Ungauged Site D from the 17th week to the 120th day. The square-root of hourly 
ozone concentrations are plotted in the vertical axes, hours on the horizontal axes. 



Table 3. Close "relatives" of the ungauged sites, their global 
circle distance (km) and the average of their correlation with 
their associated gauged sites. 



Ungauged Site 


Relative(s) 


GCD (km) 


Pearson's r 


A 


2 


66.6 


0.73 


B 


2 


62.5 


0.74 


C 


2 


35.5 


0.84 


D 


1 


11.0 


0.95 


E 


2 


38.0 


0.70 


F 


(7,8) 


(18.6,44.9) 


(0.84, 0.82) 



Gauged Site 1 



Fig. 5. Scatterplot for the square-root of ozone concentrations at Ungauged Site D and its close relative, 
Gauged Site 1. The square-root of hourly ozone concentrations are plotted in both vertical and horizontal 
axes. 

we see a strong linear relationship between Sites D and 1 as our coverage probability assessment 
had suggested. 

In spite of its reliance on the relatives, the DLM does not predict responses at the ungauged 
sites very accurately as illustrated in Figure [H That points to problems with this model which 
will be discussed in the next section. 

7. Discussion 

In general, the DLM provides a remarkably powerful modelling tool, made practical by advances 
in statistical computing. However, its substantial computational requirements still limits its appli- 
cability. Moreover, the very flexibility that makes it so powerful also imposes an immense burden 
of choice on the model. This section summarizes critical issues and includes some suggestions for 
improvement. 

Monitoring MCMC convergence 

Figure [6] represents the trace plots of model parameters A, cr^, ai and 02 of two chains from the 
initial settings in Section [Ql These two chains seem to mix well after several hundreds iterations, 
suggesting at first glance the Markov chains have converged. 
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Fig. 6. Traces of model parameters for a number of iterations of two chains. The parameters are: (a) -A, the 
range parameter; (b) -cr^, the variance parameter; (c)-qi, the phase parameter with respect to the 24-hour 
periodicity; and (d) -02, the phase parameter with respect to the 12-hour periodicity. 



Autocorrelation and partial autocorrelation of the simulated Markov chains 

However, we know that the autocorrelation, as measured by the autocorrelation function (ACF), 
is very important when considering the length of the chain. A highly auto-correlated chain needs 
a long run to yield accurate estimates. Moreover, the partial autocorrelation function (PACF) 
is also an important index for assessing a Markov chain since large values of the PACF at lag h 
indicates that the next value in the chain is dependent on past values, not just on the most recent 
ones. 

Figure [7] shows the histogram, ACF and PACF plots for the Markov chains used in Section 
16.21 after a burn-in period of 1,000. The ACF plots show the As to be highly autocorrelated, 
in other words that the A-chain docs not mix well, potentially leading to biased estimates in 
Section 16.21 Thinning the chain might reduce that autocorrelation. In other words, using every 
fc"* (fc > 1, fc G Z+) A generated by the chain could be used to produce the estimates. However, 
computational challenges make that strategy impractical; we need to use the entire chain. 



Relationship between pairs of A, u^, ai and 02 

Our prior assumptions make the model parameters A, cr^, ai and 02 uncorrelated. Figure 
[8] shows the relationship between the pairs of these parameters as a way of investigating that 
assumption. It seems valid except for the X-a^ pair in graph (a) . That graph shows a weak linear 
association between A and a^, thus pointing to a failure of that assumption for that pair. Since 

determines spatial variability while A determines correlation this relationship seems intriguing. 
Larger values of cr^ tend to go with larger As, i.e., diminished spatial correlation. Why they are 
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Fig. 7. Histogram (left panel), ACF (middle panel) and PACF (right panel) of model parameters of the IVIarl<ov 
chains after a burn-in period of 1, 000. The parameters are: (i) first row: -A, the range parameter; (ii) second 
row: -a^, the variance parameter; (iii) third row: -ai, the phase parameter with respect to the 24-hour 
periodicity; and (iv) last row: -a2, the phase parameter with respect to the 12-hour periodicity. 



(a) 



i. 

(d) 



(b) 



(e) 



(c) 



X 

(f) 



Fig. 8. Scatterplots for the pairs of model parameters: (a) A v.s. ct^; (b) A v.s. ai; (c) A v.s. 02; (d) v.s. ai; 
(e) v.s. a2; and (f) ai v.s. 02. 
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Fig. 9. Scatterplots for A against cr^ for various weel<s, based on tlie l\/ICI\/IC samples using one weel<'s data, 
that is, weel<s 4, 6 and 9, but starting from thie same initial values as those in Section [Ol 

coupled in this way is unknown but it should be accounted for in future applications of this model. 

Time varying As and a^s: empirical coverage probabilities versus nominal credible 
probabilities 

Although we follow Huerta et al. (2004) in assuming the temporal constancy of A and , it is 
natural to ask if those generated by the MCMC method change over time. A variant of this issue 
concerns the time domain of the application. Would the results for these parameters change if we 
switched from one time span to a longer one containing it? A "yes" to this question would pose a 
challenge to anyone intending to apply the model, knowing that the choice would have implications 
for the size of and A. 

To address these concerns we carried out the following studies: 

(i) Study A : Implement the DLM at ungauged sites using weekly data [Wk : fc = 1, . . . , 17). 
Generate Markov chains for A, u^, ai and a2- Obtain the coverage probabilities at each 
ungauged site and week for fixed credibility interval probabilities. 

(ii) Study B : Implement the DLM at ungauged sites using week 1 to week 17 data {Wi-.n = 
{Wi, . . . , Wn}). Estimate model parameters and interpolate the results at those ungauged 
sites. Obtain the coverage probabilities at each ungauged site and week for fixed credibility 
interval probabilities using each week's data. 

(iii) Study C : Fix A^ at week k {k = 1, . . . , 17) using values suggested by the Markov chains 
generated in Study A. Then use these A* — {\\, . . . , A^^} as fixed values in the DLM to reduce 
computation time. In other words, go through all the steps in the algorithm of Section 14.21 
but now using only fixed A*s instead of generating them by a Metropolis-Hasting step. (Note 
that we are then only using Gibbs sampling and an MCMC blocking scheme.) Compute the 
corresponding coverage probabilities using Wi.n at each ungauged site and week for fixed 
credibility interval probabilities. 

Studies A and B are intended to explore the effect of data and time propagation on the inter- 
polation results. Study C aims to pick out any significant difference in the interpolation results 




Table 4. Fixed values of A* in Study C. 



Week 


1 


2 


3 


4 


5 


6 


7 


8 


9 


A* 


54.2 


178.5 


83.7 


405.4 


86.6 


59.7 


199.3 


144.1 


322.7 


Week 


10 


11 


12 


13 


14 


15 


16 


17 




A* 


142.2 


172.7 


187.9 


315.8 


419.0 


99.8 


260.3 


284.8 





Table 5. Summary for the computational time in Studies A, B and C. Time 
is measured in seconds. Ttie total is for a complete summer long MCMC run 
without spatial prediction. 





Time 


(seconds) 


Study 


Data 


Iteration total 


Accept(%) 


Total 


/Iteration 


A 




1,500 


0.82 


17018 


13.8 


B 


VKl:17 


1,000 


0.35 


326782 


932.3 


C 


m:17 


1,000 


1.00 


329349 


329.3 



when using the fixed A* rather than using the Markov samples of As. It is also aimed at finding 
how much time would be saved by avoiding the inefficient Metropolis step. Table |4] shows these 
fixed A*s used in Study C. Table 5 shows the time saved using fixed A*s against the one using the 
Metropolis-Hastings algorithm. 

Figure [H] illustrates the MCMC estimation results obtained in Study A. It plots the Markov 
chains of A and using weekly data. It is obvious that A and vary from week to week, which 
implies that the constant A-cr^ model is not tenable over a whole summer for this dataset. 

Figure [TO] typifies figures in Dou et al. (2007) showing the coverage probabilities for various 
predictive intervals associated with the interpolators in these three studies. The solid line with 
bullets represents the results for Study A, the dotted line with up-triangles for Study B, and the 
dashed line with squares for Study C . These graphs show that the coverage probabilities of Study 
B are similar to that of Study C. This suggests that we could use the entries in Table H] as fixed 
A*s in the DLM to obtain interpolation results similar to those obtained using the Metropolis- 
within-Gibbs algorithm. 

We have studied the prediction accuracy of the simplest DLM, namely, the first-order poly- 
nomial model, in Section [3l As a result, the predictive variances should increase monotonically 
at successive time points conditional on all the 17 weeks' data, in the general DLM setting (see 
Section [3]). The plots exhibit a monotonic increasing trend in the coverage probabilities of both 
Studies B and C. This trend agrees with the graph of the coverage probabilities in Figure [TOl 
Nevertheless, those coverage probabilities of both studies deviate slightly from the expected mono- 
tonically increasing trend at some time points because of the time varying effect of A-cr^ monitored 
in Figure ini 

On the other hand. Study C enjoys significant computational time savings compared with B. 
Table [5] suggests that the computation time of the former is almost 2.8 times faster than the latter. 

Study B shows an intuitively unappealing increase in the uncertainty of interpolation results 
as time increases; coverage probabilities get larger over time as we see in Table (6] This increase 
may be interpreted as saying that for the DLM models, the As and cr^s collected from the data 
should vary over the entire time span of the study, while the prior postulates that they do not vary 
over that time span. The observed phenomenon may also be due to mis-specification of the model 
parameter values 7 = (r^,T^, . . . , A2) (See the initial settings for 7 in Section lOI ). 

Comparing the results of these studies, we find that sometimes, paradoxically, the model gives 
better results using only one week's data rather than all. However, Corollary [2] in Section[3]predicts 
this finding. Because the prior for a\ is /G(2,0.01) the expectation of ct^ is 0.01, implying that 

a'p ~ 0.01 and ~ 0.01 x 0.02. Hence, cr| ^1 + ~ 0.51, which is less than (for example, 

the median of is around 1.21 in Study B and even larger in Study A). By the sufficient and 
necessary condition in Corollary [21 the predictive variance of Study A is less than that of Study B. 
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Fig. 10. Coverage probability versus: (a) 95% nominal level for Ungauged Site D, and (b) 80% nominal level 
for Ungauged Site C. These coverage probabilities are computed for Study A: weekly data (solid bullet with 

solid line); Study B: Wv.n (up-trianglewith dotted line); Study C: Wi-.n but with fixed A* (square with dashed 
line); and Study D: Wi.n but with fixed A* and modified r'y, r'i, and t| (empty circle with solid line). 



Table 6. Coverage probabilities (%) for studies A, B and C at 
Ungauged Sites A, B, and C, at 80% nominal level. 



Ungauged Site 




A 






B 






C 




Study 


A 


B 


C 


A 


B 


C 


A 


B 


C 


Week 1 


66 


65 


72 


80 


78 


89 


82 


80 


84 


Week 2 


73 


71 


80 


76 


78 


83 


79 


81 


85 


Week 3 


63 


73 


82 


82 


86 


91 


80 


87 


93 


Week 4 


57 


74 


81 


66 


83 


88 


75 


87 


89 


Week 5 


53 


70 


82 


68 


83 


90 


59 


83 


88 


Week 6 


73 


80 


88 


75 


83 


89 


83 


89 


93 


Week 7 


69 


88 


90 


80 


92 


94 


80 


93 


97 


Week 8 


66 


89 


93 


66 


90 


93 


71 


92 


95 


Week 9 


63 


82 


88 


84 


90 


94 


77 


91 


96 


Week 10 


61 


87 


92 


75 


93 


96 


74 


94 


98 


Week 11 


58 


86 


89 


77 


93 


94 


68 


91 


95 


Week 12 


69 


90 


92 
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However, notice that cr^ and A vary from week to week in A, which may also lead to the paradox 
observed in the empirical findings of this section. For example, in (b) of Figure I10[ the coverage 
probability of B at the 4"' week is larger than that of A. From the above discussion, we know that 
the predictive variance of A should be less than that of B. However, cr^ of A is larger than that 
of B, leading an inflated predictive variance of A. This feature makes it difficult to compare these 
two predictive variances, but explains the paradox we see in those figures. 

8. Summary and Conclusions 

To assess the dynamic linear modelling approach to modelling space-time fields, we have applied 
it to an hourly ozone concentration field over a geographical spatial domain covering most of the 
eastern United States. To focus that assessment we consider just one cluster of spatial sites we call 
Cluster 2 during a single ozone season. Moreover, we have used a variant of the dynamic linear 
modelling approach of Huerta et al. (2004) implemented through MCMC sampling. 

Our assessment reveals some difficulties with that very flexible approach and practical challenges 
that it presents. We also have made some recommendations on improvement. 

A curious finding is the posterior dependence of A and cr^, in contradiction to our prior as- 
sumption. Although the very efficient method Huerta et al. (2004) propose to sampling these 
parameters is biased, that bias does not appear large enough to account for that phenomenon. We 
also discovered that the assumption of their constancy over time is untenable. 

The coverage probabilities of the model's posterior predictive credibility intervals over successive 
weeks, conditional on all 17 weeks of data, increase monotonically. Counter to intuition, that would 
imply more and more uncertainty as time evolves, an artifact of the modelling that seems hard to 
explain. A pragmatic way around this undesirable property involves incorporating the length of the 
time span of the temporal domain T into the selection of the values of the model parameters, such 
as Ty , Ti and t| . Section [3] studies the correlation structure of the simplest first-order polynomial 
DLM and finds reasonable conditions to impose on those parameters. 

One further Study D tests the proposed constraints on the data. The settings are identical with 
those in Study C except that Ty, and r| are replaced by r^/17, and t|/17, respectively, 

to take account of the longer 17 week time span of our study compared to the one week time 
span of the application in Huerta et al. (2004). Figure [TOl compares Study D with the others. 
Observe that its coverage probabilities behave like those of Study A. This adjustment does seem 
to eliminate the undesirable property of increasing credibility bands of Studies B and C. 

Another possible approach to dealing with the unsuitability of fixed model parameters uses the 
composition of Metropolis-Hasting kernels. In other words, we could include these parameters in 
the Metropolis-Hasting algorithm as in Section 14.11 We can use six Metropolis-Hasting kernels 
to sample from the target distribution 7r{'j\yi:T), updating each component of 7 iteratively, where 
7 has defined in Section O But, not surprisingly that approach fails because of the extreme 
computational burden it entails. However, that alternative is the subject of current work along 
with an approach that admits time varying As and a^s. 

The greatest difficulty involved in the use of the DLM in modelling air pollution space-time 
fields lies in the computational burden it entails. For that reason, we have not been able to address 
the geographical domain of real interest, one that embraces 274 sites in the eastern United States, 
with 120 days of hourly ozone concentrations. In a manuscript under preparation, an alternative 
hierarchical Bayesian method that can cope with that larger domain will be compared with the 
DLM where the latter can practically be applied. 
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A. Supplementary results 

A.1. Results for Section d 

Only the results about the predictive variances of yoil^ii and yoi|yii,yi2 are shown in this ap- 
pendix. The other two cases can be obtained by the same method. Refer to Theorem [1] the 
predictive variance of yai\yii can also be written as follows: 

Var{yoi\yn) = (1 - exp( —))a;<2 



The first partial derivatives of this predictive variances regarding to doi 7 ^ and are given by: 



and 



^^«r(yoi|z/n) = — exp(- — ^2+^2 + ^2 ^ 

^l^ar(yoilm) = -— exp(- — ,2 + ,2 + ,2 ■ 



-g^Var{yoi \yii) = (1 - exp( —)) <( 2 - (1 - cxp(-- 




- (a|+a2+,2)2-{2(- 

respectively. It is straightforward to obtain that V^ar(?/oi |yii) is increasing when rfoi increases, or 
A decreases, or cr^ increases. We next show these properties also hold for V^ar(?/oi|yii, 2/12)- By 
TheoremlU T^ar(?/oi|yii, 2/12) can also be written as: 



Var{yQi\yii,yi2) = (1 - exp( —))(J^ < 2 



The corresponding first partial derivatives are given as follows: 

5 ... I , 2 doi, 2A + exp(-^) 
-Var[yoi\yii,yi2) = —)a^ 



d I , ^ 2doi doi. 2^ + exp(-^) 
-^Var{yoi\yii,yi2) = r^cxp( —)a^ 



and 



dX A2 A ' " l + A 



-^Var{yoi\yii,yi2) = (1 - exp(-^)) |2 - (1 - exp(-^))^(ciA - C2C3) 

l-exp(-i^) 
> ^2 C4, 

respectively, where A = J^i^^^^^l^J,-^ , ci = cr| + 2cr| + ct^, cs cr| + cr|, C3 = cr|ci + cr2(cr2 ^ ^2)^ 
and C4 = a2ci(2a^ + 3^2 + o^) + alc2{aj + a2)(3f72 + Qaj + Aa^) + cHaj + ajf. 



A.2. Results for Section\4l\ 

The joint posterior distribution for xi;t, A and is given by 

T 

p{xi:T, A, cr^|yi:T) = P(A, cr^)p(xT|A, CT^, J/1:t) J]^ p(xT-t |xT-t+l , A, cr^, yi:T) 

T 

= p{xi:t\X, o-^, ?;i:t)p(o'^|A, yi:T)p(A|yT)- 

Suppose p(A, CT'^) = p(A)p(ct^), that is, the priors for A and are independent of other. 
The joint posterior distribution for A and can be written as foUows: 

t=i [ ^ t=i J 

If the prior for is an inverse gamma distribution with shape parameter a and scale parameter 
P, then the posterior distribution for is also an inverse gamma distribution with shape parameter 
a + ^ and scale parameter /3 + ^ X^tLi ^t'Qt^^t- 

Hence, the posterior density for A can be written as follows: 

p(A,cr^|yi:T) 



piMyi-.r) 



p{a^\X,yi..T) 



1 ^ 



-, -{a+nT/2} 



Therefore, the posterior density for Xi-x is given by 

T 

p(a:i:T|A, ct^,?/1:t) = p(xt|A, ct^, ?/i:t) JJp(xT-t|xT-t+i, A, cr^, ?/1:t)- 

t=l 



/A.3. Results for Section\5[ 

Given the values of the phase parameters, range and variance parameters and the observations 
until time t, the joint distribution of afj, ait is 



"It 



N 



«i,t-i 



J:*{9) ^exp{~V*/e} = 



where 

with T,l^{9) a scalar, Si2(^) a 1 by n vector, and S22(^) a n by n matrix. We use V* to denote 
the new distance matrix for the unknown site s and the monitoring stations si, . . . , s„. 
We then have the conditional posterior distribution of af^ as follows: 



(ttitlaf t_i,ait,ai,t-i,yt, A,ct2) ^ iV[af + i;j2(Ai)S22(Ai) H"it - "i.t-i), 

aV2(Sli(Ai) - Et2(Ai)S52(Ai)-iS5i(Ai))]. 

Similarly, the conditional posterior distribution for is 

("2tl"2t-l>"2t,a2,t-l,yt,A,Cr2) ^ iV[a2t-l + 5]l2(A2)S22(A2)"H"2t -a2,t-l), 

aV|(S|i(A2) - St2(A2)S52(A2)-^S5i(A2))]. 



(38) 



(39) 
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Using the observation equation as in Model ([T]) , we have the conditional predictive distribution 
for yf as follows: 

(y||yt,Q;ft,Q;|t,ait,Q;2t,/3t, A, 0-2) ~ N[Pt + Sitiai)alf + S2t{a2)a^t 

+Et2(A)s;2(A)~Hyt-i«A .40. 

-Sitiai)ait - S2t{a2)a2t), 
aH^uiX)-^l2m*22W-'^*2iim- 



The software is in http:/ /enviro.stat.ubc.ca 
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